498 research outputs found
Opportunities and challenges in using AI Chatbots in Higher Education
Artificial intelligence (AI) conversational chatbots have gained popularity over time, and have been widely used in the fields of e-commerce, online banking, and digital healthcare and well-being, among others. The technology has the potential to provide personalised service to a range of consumers. However, the use of chatbots within educational settings is still limited. In this paper, we present three chatbot prototypes, the Warwick Manufacturing Group, University of Warwick, are currently developing, and discuss the potential opportunities and technical challenges we face when considering AI chatbots to support our daily activities within the department. Three AI virtual agents are under development: 1) to support the delivery of a taught Master's course simulation game; 2) to support the training and use of a newly introduced educational application; 3) to improve the processing of helpdesk requests within a university department. We hope this paper is informative to those interested in using chatbots in the educational domain. We also aim to improve awareness among those within the chatbot development industry, in particular the chatbot engine providers, about the educational and operational needs within educational institutes, which may differ from those in other domains
SOA services in higher education
Service Oriented Architecture (SOA) is a recent architectural framework for distributed software system development in which software components are packaged as Services. It has become increasingly popular in academia and in industry, but has been principally used in the business domain. However, in higher education, SOA has rarely been applied or investigated. In this paper, we propose the idea of applying SOA technologies in the education domain, to increase both interoperability and flexibility within the e-learning environment. We expect that both students and teachers in higher educational institutions can benefit from this approach. We also describe a number of possible SOA services, along with a high level service roadmap to support a university's learning and teaching activities
Relative depth estimation from single monocular images with deep convolutional network
Field of study: Computer science.Dr. Grant Scott, Thesis Supervisor."December 2017."Depth estimation from single monocular images is a theoretical challenge in computer vision as well as a computational challenge in practice. This thesis addresses the problem of depth estimation from single monocular images using a deep convolutional neural fields framework; which consists of convolutional feature extraction, superpixel dimensionality reduction, and depth inference. Data were collected using a stereo vision camera, which generated depth maps though triangulation that are paired with visual images. The visual image (input) and computed depth map (desired output) are used to train the model, which has achieved 83 percent test accuracy at the standard 25 percent tolerance. The problem has been formulated as depth regression for superpixels and our technique is superior to existing state-of-the-art approaches based on its demonstrated its generalization ability, high prediction accuracy, and real-time processing capability. We utilize the VGG-16 deep convolutional network as feature extractor and conditional random fields depth inference. We have leveraged a multi-phase training protocol that includes transfer learning and network fine-tuning lead to high performance accuracy. Our framework has a robust modular nature with capability of replacing each component with different implementations for maximum extensibility. Additionally, our GPU-accelerated implementation of superpixel pooling has further facilitated this extensibility by allowing incorporation of feature tensors with exible shapes and has provided both space and time optimization. Based on our novel contributions and high-performance computing methodologies, the model achieves a minimal and optimized design. It is capable of operating at 30 fps; which is a critical step towards empowering real-world applications such as autonomous vehicle with passive relative depth perception using single camera vision-based obstacle avoidance, environment mapping, etc.Includes bibliographical references (pages 61-65)
An effective services framework for sharing educational resources
Nowadays, the growing number of software tools to support e-learning and the data
they rely upon are valuable resources, supporting different aspects of the complex
learning and teaching processes, including designing learning content, delivering
learning activities, and evaluating students’ learning performance. However, sharing
these educational resources efficiently and effectively is a challenge: there are many
resources, these have not been described accurately and in general they do not interoperate,
and it is common for the tools to rely on different technologies. This thesis
explores a solution – a novel educational services framework – to improve the sharing
of current e-resources, by applying the latest service technologies in the context of
higher education. Our findings suggest that the proposed framework is effective to deal
with the technical and educational issues in resource discovery, interoperability and
reusability, however, there are still technical challenges remaining for implementing
this service framework.
This research is divided into 3 phases. The first phase investigates the sharing of elearning
resources through a literature survey, and identifies limitations on current
developments. In the second phase, the current problems relating to resource sharing
are addressed by a proposed educational service framework, which contains both
educational and technical components. Through a case study, nine e-learning services
and their dataflows are identified. To determine the technical components of the
framework, a novel Educational Service Architecture is proposed, which allows
resources to be better described, structured and connected, by following the principles
of discoverability, interoperability and reusability in service technologies. In the third
phase, part of the framework is implemented and evaluated by two studies. In the first
study, users’ experiences were collected via a simulation experiment, to compare the
effectiveness of a service prototype with that of the use of current technologies. During
the second part of the evaluation, technical challenges for implementing the services
framework were identified via a case study, involving the implementation of another
service prototype
Efficient algorithms for scalable video coding
A scalable video bitstream specifically designed for the needs of various client terminals,
network conditions, and user demands is much desired in current and future video transmission
and storage systems. The scalable extension of the H.264/AVC standard (SVC) has
been developed to satisfy the new challenges posed by heterogeneous environments, as
it permits a single video stream to be decoded fully or partially with variable quality, resolution,
and frame rate in order to adapt to a specific application. This thesis presents
novel improved algorithms for SVC, including: 1) a fast inter-frame and inter-layer coding
mode selection algorithm based on motion activity; 2) a hierarchical fast mode selection
algorithm; 3) a two-part Rate Distortion (RD) model targeting the properties of different
prediction modes for the SVC rate control scheme; and 4) an optimised Mean Absolute
Difference (MAD) prediction model.
The proposed fast inter-frame and inter-layer mode selection algorithm is based on the
empirical observation that a macroblock (MB) with slow movement is more likely to be
best matched by one in the same resolution layer. However, for a macroblock with fast
movement, motion estimation between layers is required. Simulation results show that
the algorithm can reduce the encoding time by up to 40%, with negligible degradation in
RD performance.
The proposed hierarchical fast mode selection scheme comprises four levels and makes
full use of inter-layer, temporal and spatial correlation aswell as the texture information of
each macroblock. Overall, the new technique demonstrates the same coding performance
in terms of picture quality and compression ratio as that of the SVC standard, yet produces
a saving in encoding time of up to 84%. Compared with state-of-the-art SVC fast mode
selection algorithms, the proposed algorithm achieves a superior computational time reduction
under very similar RD performance conditions.
The existing SVC rate distortion model cannot accurately represent the RD properties of
the prediction modes, because it is influenced by the use of inter-layer prediction. A separate
RD model for inter-layer prediction coding in the enhancement layer(s) is therefore
introduced. Overall, the proposed algorithms improve the average PSNR by up to 0.34dB
or produce an average saving in bit rate of up to 7.78%. Furthermore, the control accuracy
is maintained to within 0.07% on average.
As aMADprediction error always exists and cannot be avoided, an optimisedMADprediction
model for the spatial enhancement layers is proposed that considers the MAD from
previous temporal frames and previous spatial frames together, to achieve a more accurateMADprediction.
Simulation results indicate that the proposedMADprediction model
reduces the MAD prediction error by up to 79% compared with the JVT-W043 implementation
Recommended from our members
Discovering gated recurrent neural network architectures
Reinforcement Learning agent networks with memory are a key component in solving POMDP tasks.
Gated recurrent networks such as those composed of Long Short-Term
Memory (LSTM) nodes have recently been used to improve
state of the art in many supervised sequential processing tasks such as speech
recognition and machine translation. However, scaling them to deep
memory tasks in reinforcement learning domain is challenging because of sparse and deceptive
reward function. To address this challenge first, a new secondary optimization objective is introduced
that maximizes the information (Info-max) stored in
the LSTM network. Results indicate that when combined with neuroevolution, Info-max can discover powerful
LSTM-based memory solutions that outperform traditional
RNNs. Next, for the supervised learning tasks, neuroevolution techniques are employed
to design new LSTM architectures. Such architectural variations include
discovering new pathways between the recurrent layers as well as designing new gated
recurrent nodes. This dissertation proposes evolution of a tree-based
encoding of the gated memory nodes, and shows that it makes
it possible to explore new variations more effectively than other
methods. The method discovers nodes with multiple recurrent paths
and multiple memory cells, which lead to significant improvement
in the standard language modeling benchmark task. The dissertation also
shows how the search process can be speeded up by training an
LSTM network to estimate performance of candidate structures, and
by encouraging exploration of novel solutions. Thus, evolutionary
design of complex neural network structures promises to improve
performance of deep learning architectures beyond human ability
to do so.Computer Science
activePDF-Toolk
This document provides information for deploying activePDF Toolkit Professional in a development environment. This document is organized into four sections: Getting Started, Tutorials, Technical Reference and the Toolkit Appendices. The Getting Started section covers setup and installation, includes a product overview and information related to operating Toolkit Professional. Tutorials includes examples of many Toolkit features, including PDF generation and form filling. All of the tutorials can be used with activePDF Toolkit. Technical Reference provides detailed information on Toolkit’s objects, subobjects, methods and properties
Interactive Manipulation of 3D Scene Projections
Linear perspective is a good approximation to the format in which the human visual system conveys 3D scene information to the brain. Artists expressing 3D scenes, however, create nonlinear projections that balance their linear perspective view of a scene with elements of aesthetic style, layout and relative importance of scene objects. Manipulating the many parameters of a linear perspective camera to achieve a desired view is not easy. Controlling and combining mul-tiple such cameras to specify a nonlinear projection is an even more cumbersome task. This paper presents a direct interface, where an artist manipulates in 2D the desired projection of a few features of the 3D scene. The features represent a rich set of constraints which define the overall projection of the 3D scene. Desirable properties of local linear perspective and global scene coherence drive a heuristic algorithm that attempts to interactively satisfy the sketched constraints as a weight-averaged projection of a minimal set of linear perspective cameras. This paper shows that 2D fea-ture constraints are a direct and effective approach to control both the 2D layout of scene objects and the conceptually complex, high dimensional parameter space of nonlinear scene projection. The simplicity of our interface also makes it an appealing alternative to standard through-the-lens and widget based techniques to control a single linear perspective camera
Recommended from our members
Building robust and modular question answering systems
Over the past few years, significant progress has been made in QA systems due to the availability of annotated datasets on a large scale and the impressive advancements in large-scale pre-trained language models. Despite these successes, the black-box nature of end-to-end trained QA systems makes them hard to interpret and control. When these systems encounter inputs that deviate from their training data distribution or are subjected to adversarial perturbations, their performance tends to deteriorate by a large margin. Furthermore, they may occasionally produce unanticipated results, potentially leading to confusion among users. Additionally, this deficiency in robustness and interpretability poses challenges when deploying such models in real-world scenarios.
In this dissertation, we aim to build robust QA systems by explicitly decomposing various QA tasks into distinct sub-modules, each responsible for a particular aspect of the overall QA process. Through this decomposition, we seek to achieve improved performance in terms of both the system's ability to handle diverse and challenging inputs (robustness) and its capacity to provide transparent and explainable reasoning (interpretability).
To address the aforementioned limitations, in this dissertation, we aim to build robust QA models by explicitly decomposing different QA tasks into different sub-modules. We argue that utilizing these sub-modules can substantially improve the robustness and interpretability of different QA systems. In the first half of this dissertation, we introduce three sub-modules to mitigate the dataset artifacts that models learn from datasets. These sub-modules also enable us to examine and exert explicit control over the intermediate outputs. In the first work, to address question answering that requires multi-hop reasoning, we propose a chain extractor, which extracts the reasoning chains necessary for models to derive the final answer. The reasoning chains not only prevent the model from exploiting reasoning shortcuts but also provide an explanation of how the answer is derived. In the second work, we incorporate an alignment layer between the question and the context before generating the answer. This alignment layer can help us interpret the models' behavior and improve the robustness of adversarial settings. In the third work, we add an answer verifier after QA models generate the answer. This verifier can boost QA models' prediction confidence across several different domains and help us spot cases where QA models predict the right answer for the wrong reason by utilizing the external NLI datasets and models.
In the second half of this dissertation, we tackle the problem of complex fact-checking in the real world by treating it as a modularized QA task. We first decompose a complex claim into several yes-no subquestions whose answer directly contributes to the veracity of the claim. Then, each sub-question is fed into a commercial search engine to retrieve relevant documents. Additionally, we extract the relevant snippets in the retrieved documents and use a GPT3-based summarizer to generate the core evidence for checking the claim. We show that the decompositions can play an important role in both evidence retrieval and veracity composition of an explainable fact-checking system. Also, we show the GPT3-based evidence summarizer generates faithful summaries of documents most of the time indicating it can be used as an
effective part of the pipeline. Moreover, we annotate a dataset -- ClaimDecomp, containing 1,200 complex claims and the decompositions. We believe that this dataset can further promote building explainable fact-checking systems and analyzing complex claims in the real world.Computer Science
Recommended from our members
From active to passive spatial acoustic sensing and applications
The active acoustic sensing system emits modulated acoustic waves and analyzes reflection signals. It is dominant in acoustic spatial sensing. On the other side, the passive acoustic sensing system receives and investigates nature sounds directly. It is good at semantic tasks but has weak performance on spatial sensing. In this dissertation, we manage to bridge three gaps in existing systems. They are the gap between the assumption of signal processing algorithms and the real acoustic environment, the gap between powerful active spatial sensing and limited passive spatial sensing, and the gap between the semantic features and spatial information. We evolve the acoustic sensing system design and extend the functionalities by three novel systems.
First, we develop a fully active spatial sensing system DeepRange which can adapt to the real environment easily. We develop an effective mechanism to generate synthetic training data that captures noise, speaker/mic distortion, and interference in the signals. It removes the need of collecting a large volume of data. We then design a deep range neural network (DRNet) to estimate the distance from raw acoustic signals. It is inspired by signal processing that an ultra-long convolution kernel size helps to combat noise and interference. The model is fully trained over synthetic data, but it can achieve sub-centimeter error robustly in real data despite various environments, background noise, interference, and mobile phone models.
Second, we develop a fused active and passive spatial sensing system for speech separation noted as Spatial Aware Multi-task learning-based Separation (SAMS). We leverage both active sensing and passive sensing to improve AoA estimation and jointly optimize the semantic task and the spatial task. SAMS estimates the spatial location and extracts speech for the target user during teleconferencing simultaneously. We first generate fine-grained spatial embeddings from the user’s voice and inaudible tracking sound, which contains the user’s position and rich multipath information. Furthermore, we develop a deep neural network with multi-task learning to jointly optimize source separation and location. We significantly speed up inference to provide a real-time guarantee.
Finally, we deeply fuse the semantic features and spatial cues to combat the interference and noise in the real environment as well as enable depth sensing in a fully passive setup. Inspired by the ”flash-to-bang” phenomenon (i.e.hearing the thunder after seeing the lightning), we propose FBDepth to measure the depth of the sound source. We formulate the problem as an audio-visual event localization task for collision events. Specifically, FBDepth first aligns correspondence between the video track and audio track to locate the target object and target sound in a coarse granularity. Based on the observation of moving objects’ trajectories, it proposes to estimate the intersection of optical flow before and after the collision to locate video events in time. It feeds the estimated timestamp of the video event and the other modalities for the final depth estimation. We use a mobile phone to collect the 3.6K+ video clips involving 24 different objects at up to 60m. FBDepth shows superior performance especially at a long range compared to monocular and stereo methods.Computer Science
- …